Developer(s) | LLVM Developer Group |
---|---|
Initial release | 2003 |
Stable release | 3.0 / 1 December 2011 |
Written in | C++ |
Operating system | Cross-platform |
Type | Compiler |
License | University of Illinois Open Source License[1] |
Website | llvm.org |
LLVM (formerly Low Level Virtual Machine) is a compiler infrastructure written in C++ that is designed for compile-time, link-time, run-time, and "idle-time" optimization of programs written in arbitrary programming languages. Originally implemented for C/C++, the language-agnostic design (and the success) of LLVM has since spawned a wide variety of front ends, including Objective-C, Fortran, Ada, Haskell, Java bytecode, Python, Ruby, ActionScript, GLSL, Clang, and others.
The LLVM project started in 2000 at the University of Illinois at Urbana–Champaign, under the direction of Vikram Adve and Chris Lattner. LLVM was originally developed as a research infrastructure to investigate dynamic compilation techniques for static and dynamic programming languages. LLVM was released under the University of Illinois Open Source License,[1] a BSD-style license. In 2005, Apple Inc. hired Lattner and formed a team to work on the LLVM system for various uses within Apple's development systems.[2] LLVM is an integral part of Apple's latest development tools for Mac OS X and iOS.[3]
The name "LLVM" was originally an acronym for "Low Level Virtual Machine", but the acronym caused widespread confusion because virtual machines are just one of the many things that LLVM can be used to build. As the scope of the project grew even further, LLVM became an umbrella project that includes a variety of other compiler and low-level tool technologies as well, making the name even less apt. As such, the project abandoned[4] the acronym. Now, LLVM is a "brand" that applies to the LLVM umbrella project, the LLVM intermediate representation, the LLVM debugger, the LLVM C++ standard library, etc.
Contents |
LLVM can provide the middle layers of a complete compiler system, taking intermediate form (IF) code from a compiler and emitting an optimized IF. This new IF can then be converted and linked into machine-dependent assembly code for a target platform. LLVM can accept the IF from the GCC toolchain, allowing it to be used with a wide array of extant compilers written for that project.
LLVM can also generate relocatable machine code at compile-time or link-time or even binary machine code at run-time.
LLVM supports a language-independent instruction set and type system. Each instruction is in static single assignment form (SSA), meaning that each variable (called a typed register) is assigned once and is frozen. This helps simplify the analysis of dependencies among variables. LLVM allows code to be compiled statically, as it is under the traditional GCC system, or left for late-compiling from the IF to machine code in a just-in-time compiler (JIT) in a fashion similar to Java. The type system consists of basic types such as integers or floats and five derived types: pointers, arrays, vectors, structures, and functions. A type construct in a concrete language can be represented by combining these basic types in LLVM. For example, a class in C++ can be represented by a combination of structures, functions and arrays of function pointers.
The LLVM JIT compiler can optimize unneeded static branches out of a program at runtime, and thus is useful for partial evaluation in cases where a program has many options, most of which can easily be determined unneeded in a specific environment. This feature is used in the OpenGL pipeline of Mac OS X Leopard (v10.5) to provide support for missing hardware features.[5] Graphics code within the OpenGL stack was left in intermediate form, and then compiled when run on the target machine. On systems with high-end GPUs, the resulting code was quite thin, passing the instructions onto the GPU with minimal changes. On systems with low-end GPUs, LLVM would compile optional procedures that run on the local central processing unit (CPU) that emulate instructions that the GPU cannot run internally. LLVM improved performance on low-end machines using Intel GMA chipsets. A similar system was developed under the Gallium3D LLVMpipe, and incorporated into the GNOME shell to allow it to run without a GPU.[6]
In contrast, in the cases where raw performance is benchmarked, LLVM 2.9 trails GCC 4.6.1 in code quality (speed of the compiled programs) by about 10% on average, while compiling 20-30% faster.[7][8]
LLVM was originally written to be a replacement for the existing code generator in the GCC stack,[9] and many of the GCC front ends have been modified to work with it. LLVM currently supports compiling of Ada, C, C++, D, Fortran, and Objective-C, using various front ends, some derived from version 4.0.1 and 4.2 of the GNU Compiler Collection (GCC).
Widespread interest in LLVM has led to a number of efforts to develop entirely new front ends for a variety of languages. The one that has received the most attention is Clang, a new compiler supporting C, Objective-C and C++. Primarily supported by Apple, Clang is aimed at replacing the C/Objective-C compiler in the GCC system with a modern system that is more easily integrated with integrated development environments (IDEs), and has wider support for multithreading. Objective-C development under GCC was stagnant and Apple's changes to the language were supported in a separately maintained branch. Creating their own compiler let them address many of the same problems LLVM addressed for IDE integration and other modern features, while also making the main development branch the main Objective-C implementation.
The Utrecht Haskell compiler can generate code for LLVM which, though the generator is in the early stages of development, has been shown in many cases to be more efficient than the C code generator.[10] The Glasgow Haskell Compiler (GHC) has a working backend for LLVM that achieves a 30% speed-up of the compiled code when compared to native code compiling via GHC or C code generation followed by compilation, missing only one of the many optimization techniques implemented by the GHC.[11]
There are many other components in various stages of development; including, but not limited to, a Java bytecode front end, a Common Intermediate Language (CIL) front end, a CPython front end, [12] the MacRuby implementation of Ruby 1.9, various front ends for Standard ML, and a new graph coloring register allocator.